Goto

Collaborating Authors

 Rush County


HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning

Tan, Xingyu, Wang, Xiaoyang, Liu, Qing, Xu, Xiwei, Yuan, Xin, Zhu, Liming, Zhang, Wenjie

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge. Current hybrid RAG system retrieves evidence from both knowledge graphs (KGs) and text documents to support LLM reasoning. However, it faces challenges like handling multi-hop reasoning, multi-entity questions, multi-source verification, and effective graph utilization. To address these limitations, we present HydraRAG, a training-free framework that unifies graph topology, document semantics, and source reliability to support deep, faithful reasoning in LLMs. HydraRAG handles multi-hop and multi-entity problems through agent-driven exploration that combines structured and unstructured retrieval, increasing both diversity and precision of evidence. To tackle multi-source verification, HydraRAG uses a tri-factor cross-source verification (source trustworthiness assessment, cross-source corroboration, and entity-path alignment), to balance topic relevance with cross-modal agreement. By leveraging graph structure, HydraRAG fuses heterogeneous sources, guides efficient exploration, and prunes noise early. Comprehensive experiments on seven benchmark datasets show that HydraRAG achieves overall state-of-the-art results on all benchmarks with GPT-3.5-Turbo, outperforming the strong hybrid baseline ToG-2 by an average of 20.3% and up to 30.1%. Furthermore, HydraRAG enables smaller models (e.g., Llama-3.1-8B) to achieve reasoning performance comparable to that of GPT-4-Turbo. The source code is available on https://stevetantan.github.io/HydraRAG/.


Aligning ESG Controversy Data with International Guidelines through Semi-Automatic Ontology Construction

Iwata, Tsuyoshi, Comte, Guillaume, Flores, Melissa, Kondo, Ryoma, Hisano, Ryohei

arXiv.org Artificial Intelligence

The growing importance of environmental, social, and governance data in regulatory and investment contexts has increased the need for accurate, interpretable, and internationally aligned representations of non-financial risks, particularly those reported in unstructured news sources. However, aligning such controversy-related data with principle-based normative frameworks, such as the United Nations Global Compact or Sustainable Development Goals, presents significant challenges. These frameworks are typically expressed in abstract language, lack standardized taxonomies, and differ from the proprietary classification systems used by commercial data providers. In this paper, we present a semi-automatic method for constructing structured knowledge representations of environmental, social, and governance events reported in the news. Our approach uses lightweight ontology design, formal pattern modeling, and large language models to convert normative principles into reusable templates expressed in the Resource Description Framework. These templates are used to extract relevant information from news content and populate a structured knowledge graph that links reported incidents to specific framework principles. The result is a scalable and transparent framework for identifying and interpreting non-compliance with international sustainability guidelines.


OntView: What you See is What you Meant

Bobed, Carlos, Quintana, Carlota, Mena, Eduardo, Bobed, Jorge, Bobillo, Fernando

arXiv.org Artificial Intelligence

In the field of knowledge management and computer science, ontologies provide a structured framework for modeling domain-specific knowledge by defining concepts and their relationships. However, the lack of tools that provide effective visualization is still a significant challenge. While numerous ontology editors and viewers exist, most of them fail to graphically represent ontology structures in a meaningful and non-overwhelming way, limiting users' ability to comprehend dependencies and properties within large ontological frameworks. In this paper, we present OntView, an ontology viewer that is designed to provide users with an intuitive visual representation of ontology concepts and their formal definitions through a user-friendly interface. Building on the use of a DL reasoner, OntView follows a "What you see is what you meant" paradigm, showing the actual inferred knowledge. One key aspect for this is its ability to visualize General Concept Inclusions (GCI), a feature absent in existing visualization tools. Moreover, to avoid a possible information overload, Ontview also offers different ways to show a simplified view of the ontology by: 1) creating ontology summaries by assessing the importance of the concepts (according to different available algorithms), 2) focusing the visualization on the existing TBox elements between two given classes and 3) allowing to hide/show different branches in a dynamic way without losing the semantics. OntView has been released with an open-source license for the whole community.


A Global Multi-Unit Calibration as a Method for Large Scale IoT Particulate Matter Monitoring Systems Deployments

De Vito, Saverio, Elia, Gerardo D, Ferlito, Sergio, Di Francia, Girolamo, Davidovic, Milos, Kleut, Duska, Stojanovic, Danka, Stojanovic, Milena Jovasevic

arXiv.org Artificial Intelligence

Scalable and effective calibration is a fundamental requirement for Low Cost Air Quality Monitoring Systems and will enable accurate and pervasive monitoring in cities. Suffering from environmental interferences and fabrication variance, these devices need to encompass sensors specific and complex calibration processes for reaching a sufficient accuracy to be deployed as indicative measurement devices in Air Quality (AQ) monitoring networks. Concept and sensor drift often force calibration process to be frequently repeated. These issues lead to unbearable calibration costs which denies their massive deployment when accuracy is a concern. In this work, We propose a zero transfer samples, global calibration methodology as a technological enabler for IoT AQ multisensory devices which relies on low cost Particulate Matter (PM) sensors. This methodology is based on field recorded responses from a limited number of IoT AQ multisensors units and machine learning concepts and can be universally applied to all units of the same type. A multi season test campaign shown that, when applied to different sensors, this methodology performances match those of state of the art methodology which requires to derive different calibration parameters for each different unit. If confirmed, these results show that, when properly derived, a global calibration law can be exploited for a large number of networked devices with dramatic cost reduction eventually allowing massive deployment of accurate IoT AQ monitoring devices. Furthermore, this calibration model could be easily embedded on board of the device or implemented on the edge allowing immediate access to accurate readings for personal exposure monitor applications as well as reducing long range data transfer needs.


Document Automation Architectures: Updated Survey in Light of Large Language Models

Achachlouei, Mohammad Ahmadi, Patil, Omkar, Joshi, Tarun, Nair, Vijayan N.

arXiv.org Artificial Intelligence

This paper surveys the current state of the art in document automation (DA). The objective of DA is to reduce the manual effort during the generation of documents by automatically creating and integrating input from different sources and assembling documents conforming to defined templates. There have been reviews of commercial solutions of DA, particularly in the legal domain, but to date there has been no comprehensive review of the academic research on DA architectures and technologies. The current survey of DA reviews the academic literature and provides a clearer definition and characterization of DA and its features, identifies state-of-the-art DA architectures and technologies in academic research, and provides ideas that can lead to new research opportunities within the DA field in light of recent advances in generative AI and large language models.


Understanding CNN Hidden Neuron Activations Using Structured Background Knowledge and Deductive Reasoning

Dalal, Abhilekha, Sarker, Md Kamruzzaman, Barua, Adrita, Vasserman, Eugene, Hitzler, Pascal

arXiv.org Artificial Intelligence

A major challenge in Explainable AI is in correctly interpreting activations of hidden neurons: accurate interpretations would provide insights into the question of what a deep learning system has internally detected as relevant on the input, demystifying the otherwise black-box character of deep learning systems. The state of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans, but systematic automated methods that would be able to hypothesize and verify interpretations of hidden neuron activations are underexplored. In this paper, we provide such a method and demonstrate that it provides meaningful interpretations. Our approach is based on using large-scale background knowledge approximately 2 million classes curated from the Wikipedia concept hierarchy together with a symbolic reasoning approach called Concept Induction based on description logics, originally developed for applications in the Semantic Web field. Our results show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network through a hypothesis and verification process.


A Modular Ontology for MODS -- Metadata Object Description Schema

Rayan, Rushrukh, Shimizu, Cogan, Sieverding, Heidi, Hitzler, Pascal

arXiv.org Artificial Intelligence

The Metadata Object Description Schema (MODS) was developed to describe bibliographic concepts and metadata and is maintained by the Library of Congress. Its authoritative version is given as an XML schema based on an XML mindset which means that it has significant limitations for use in a knowledge graphs context. We have therefore developed the Modular MODS Ontology (MMODS-O) which incorporates all elements and attributes of the MODS XML schema. In designing the ontology, we adopt the recent Modular Ontology Design Methodology (MOMo) with the intention to strike a balance between modularity and quality ontology design on the one hand, and conservative backward compatibility with MODS on the other.


Scattering Spectra Models for Physics

Cheng, Sihao, Morel, Rudy, Allys, Erwan, Ménard, Brice, Mallat, Stéphane

arXiv.org Artificial Intelligence

Physicists routinely need probabilistic models for a number of tasks such as parameter inference or the generation of new realizations of a field. Establishing such models for highly non-Gaussian fields is a challenge, especially when the number of samples is limited. In this paper, we introduce scattering spectra models for stationary fields and we show that they provide accurate and robust statistical descriptions of a wide range of fields encountered in physics. These models are based on covariances of scattering coefficients, i.e. wavelet decomposition of a field coupled with a point-wise modulus. After introducing useful dimension reductions taking advantage of the regularity of a field under rotation and scaling, we validate these models on various multi-scale physical fields and demonstrate that they reproduce standard statistics, including spatial moments up to 4th order. These scattering spectra provide us with a low-dimensional structured representation that captures key properties encountered in a wide range of physical fields. These generic models can be used for data exploration, classification, parameter inference, symmetry detection, and component separation.


Web of Things and Trends in Agriculture: A Systematic Literature Review

Farooq, Muhammad Shoaib, Riaz, Shamyla, Alvi, Atif

arXiv.org Artificial Intelligence

In the past few years, the Web of Things (WOT) became a beneficial game-changing technology within the Agriculture domain as it introduces innovative and promising solutions to the Internet of Things (IoT) agricultural applications problems by providing its services. WOT provides the support for integration, interoperability for heterogeneous devices, infrastructures, platforms, and the emergence of various other technologies. The main aim of this study is about understanding and providing a growing and existing research content, issues, and directions for the future regarding WOT-based agriculture. Therefore, a systematic literature review (SLR) of research articles is presented by categorizing the selected studies published between 2010 and 2020 into the following categories: research type, approaches, and their application domains. Apart from reviewing the state-of-the-art articles on WOT solutions for the agriculture field, a taxonomy of WOT-base agriculture application domains has also been presented in this study. A model has also presented to show the picture of WOT based Smart Agriculture. Lastly, the findings of this SLR and the research gaps in terms of open issues have been presented to provide suggestions on possible future directions for the researchers for future research.


How AI services can change the face of Custom Web Development?

#artificialintelligence

There are more than four billion active internet users globally, thereby becoming the single largest target group of potential customers for businesses. It is not surprising that businesses are trying to establish a robust online presence -- the most popular mode being through a fully-functional website. Thus, they are availing custom web development solutions from expert developers. Such is the potential of the internet that no business can afford to overlook it. Irrespective of their size, businesses are trying to make their websites more interactive.